Back - Trigger

Software Projects - Exploratory Data Analysis

Lacrimae rerum. Memento mori. Memento vivere.

5-Factor Asset Pricing Portfolios

Evaluating the varying effects on portfolios with target segmentations, factors can be seen as proxies for the characteristics of equities and other assets which explain performance and provide premiums due to a relative risk. With the Fama-French 5-Factor Model for asset pricing, these risk factors consider aspects of market beta, market capitalization, book-to-market equity, operating profitability, and change in investment assets (recent performance momentum was excluded from this analysis). The data is accessed from the online library provided by Kenneth French, which highlights returns relevant to the research into asset pricing models from Eugene Fama and Kenneth French. As accessing the data through the online library is segmented by type, the overall data was iteratively collected and stored as variables in a PKL file or as sheets in an XLSX file for all relevant types. An exploratory analysis was performed to show the distributions, time-varying characteristics, and interactions affecting realized returns of the data. The primary packages used in the project include Python with Numpy, Pandas, Matplotlib, Seaborn, Urllib, and Pickle.

Data Considerations

In the construction of the data, the returns are in USD and include dividends and capital gains for the period without fees or taxes and without continuous compounding (unless specified as annualized). The return from market beta is equal to the difference in return between a value-weighted market portfolio and the 1-month U.S. Treasury bill as the risk-free rate. For the value factor (High Minus Low), profitability factor (Robust Minus Weak), and investment factor (Conservative Minus Aggressive), the portfolios are sorted into 2 groups for market capitalization (with the upper 90% of equities with the highest and lower 10% of equities with the lowest market capitalization) and 3 groups respectively for book-to-market equity, operating profitability, or change in investment assets (with breakpoints at the 30th and 70th percentiles for the relevant multiples). The return from the size factor (Small Minus Big) is the average return using the equally-weighted combinations of groups which were formed using the value factor, profitability factor, and investment factor.

Definition used in the data for market beta relative to the risk-free rate:
\[\begin{gather*} \text{Mkt-Rf} = \text{Value-Weighted Market Portfolio} - \text{Risk-Free Rate} \end{gather*}\]
Definition used in the data for the size factor and categorized as Small Minus Big:
\[\begin{gather*} \begin{split} \text{SMB} &= \frac{1}{3} \left(\frac{1}{3} (\text{Small Value} + \text{Small Neutral} + \text{Small Growth}) - \frac{1}{3} (\text{Big Value} + \text{Big Neutral} + \text{Big Growth})\right. + \cdots \\ & \cdots + \frac{1}{3} (\text{Small Robust} + \text{Small Neutral} + \text{Small Weak}) - \frac{1}{3} (\text{Big Robust} + \text{Big Neutral} + \text{Big Weak}) + \cdots \\ & \cdots + \left.\frac{1}{3} (\text{Small Conserv.} + \text{Small Neutral} + \text{Small Aggr.}) - \frac{1}{3} (\text{Big Conserv.} + \text{Big Neutral} + \text{Big Aggr.})\right) \end{split} \end{gather*}\]
Definition used in the data for the value factor and categorized as High Minus Low:
\[\begin{gather*} \text{HML} = \frac{1}{2} (\text{Small Value} + \text{Big Value}) - \frac{1}{2} (\text{Small Growth} + \text{Big Growth}) \end{gather*}\]
Definition used in the data for the profitability factor and categorized as Robust Minus Weak:
\[\begin{gather*} \text{RMW} = \frac{1}{2} (\text{Small Robust} + \text{Big Robust}) - \frac{1}{2} (\text{Small Weak} + \text{Big Weak}) \end{gather*}\]
Definition used in the data for the investment factor and categorized as Conservative Minus Aggressive:
\[\begin{gather*} \text{CMA} = \frac{1}{2} (\text{Small Conservative} + \text{Big Conservative}) - \frac{1}{2} (\text{Small Aggressive} + \text{Big Aggressive}) \end{gather*}\]

For this analysis, there is consideration for portfolios specifically constructed to target various combinations of factors. This allows for evaluation of the intersection of independent portfolios with differing ranges of sorts between and within factors. For example, the sort of portfolios can be formed by looking at the size factor and value factor using 2 groups for market capitalization and 3 groups for book-to-market equity to produce 6 portfolios ranging between equities with small market capitalizations, large market capitalizations, high book-to-market equity, low book-to-market equity, and neutral multiples. Similarly, the sort of portfolios can be formed by looking at the size factor, profitability factor, and investment factor using 2 groups for market capitalization, 4 groups for operating profitability, and 4 groups for change in investment assets to produce 32 portfolios ranging between equities with small market capitalizations, large market capitalizations, robust gross profits, weak gross profits, conservative investments, aggressive investments, and neutral multiples.

With regard to the regions, countries are grouped based on their classification as developed markets or emerging markets (which generally follows classifications from MSCI) and relative location with the extent of the data varying based on availability. The developed markets include Australia, Austria, Belgium, Canada, Switzerland, Germany, Denmark, Spain, Finland, France, Great Britain, Greece, Hong Kong, Ireland, Italy, Japan, Netherlands, Norway, New Zealand, Portugal, Sweden, Singapore, and United States. The European regions include Austria, Belgium, Switzerland, Germany, Denmark, Spain, Finland, France, Great Britain, Greece, Ireland, Italy, Netherlands, Norway, Portugal, and Sweden. The Asia Pacific regions include Australia, Hong Kong, Japan, New Zealand, and Singapore. The emerging markets include Brazil, Chile, China, Colombia, Czech Republic, Egypt, Greece, Hungary, India, Indonesia, Malaysia, Mexico, Pakistan, Peru, Philippines, Poland, Qatar, Saudi Arabia, South Africa, South Korea, Taiwan, Thailand, Turkey, and United Arab Emirates.

Univariate Portfolios Sorts

With the available data in the United States, it is possible to construct univariate portfolios based on the individual factors (and market beta), where each portfolio has an increasing degree of exposure to the related factor. Using the relevant fundamental ratio, the equities are divided into the lower quintile of equities with the lowest values, middle quintiles of equities with the average values (considered to be core, blend, or neutral), and upper quintile of equities with the highest values. This allows for an ideal overview of the expected characteristics and performance of long-only portfolios based on the corresponding construction (although without consideration for fees or taxes and obstacles of implementation).

Yearly returns for univariate portfolios of size from the Fama-French 5-Factor Model in the United States:
Yearly returns for univariate portfolios of value from the Fama-French 5-Factor Model in the United States:
Yearly returns for univariate portfolios of profitability from the Fama-French 5-Factor Model in the United States:
Yearly returns for univariate portfolios of investment from the Fama-French 5-Factor Model in the United States:

Trivariate Portfolios Sorts

To identify interactions, portfolios can be constructed to target combinations of factors through threefold-variation sorts (with market beta being common to each portfolio). For size, value, and profitability, the portfolios formed include the lower and upper halves of market capitalizations, lower and upper quarters of book-to-market equity, and lower and upper quarters of operating profitability. For size, value, and investment, the portfolios formed include the lower and upper halves of market capitalizations, lower and upper quarters of book-to-market equity, and lower and upper quarters of change in investment assets. For size, profitability, and investment, the portfolios formed include the lower and upper halves of market capitalizations, lower and upper quarters of operating profitability, and lower and upper quarters of change in investment assets. (Unfortunately, fourfold-variation sorts are not currently available).

An interesting finding is that, although the size factor in isolation often does not provide a robust premium across time in different regions, the magnitudes of other factors tend to be enhanced for equities with small market capitalizations. In other words and with regard to the value factor, there is a premium for equities with cheap valuations over equities with expensive valuations, but there is also a premium for equities with small market capitalizations and cheap valuations over equities with large market capitalizations and cheap valuations. It has been speculated with evidence that the lack of a premium from the size factor in isolation is largely due to the volatile performance of equities with small market capitalizations and low quality (which may be indirectly removed when considering the size factor in combination with other factors due to their very poor multiples). Thus, if considering the size factor, it is reasonable and optimal to account for quality for improvements in terms of its efficacy, robustness, consistency, and stability across time.

Yearly returns for trivariate portfolios of size, value, and profitability from the FF5 in developed markets:
Yearly returns for trivariate portfolios of size, value, and investment from the FF5 in developed markets:
Yearly returns for trivariate portfolios of size, profitability, and investment from the FF5 in developed markets:

Bivariate Portfolios Sorts

The data for emerging markets for constructing portfolios to target combinations of factors is only available as twofold-variation sorts (with market beta being common to each portfolio). For size and value, the portfolios formed include the lower and upper halves of market capitalizations and lower and upper thirds of book-to-market equity. For size and profitability, the portfolios formed include the lower and upper halves of market capitalizations and lower and upper thirds of operating profitability. For size and investment, the portfolios formed include the lower and upper halves of market capitalizations and lower and upper thirds of change in investment assets. An aspect to keep in mind is that the exposure to factors may not necessarily be the same for related sorts given the sort by size. In other words, there would typically be more equities with small market capitalizations relative to equities with large market capitalizations and this allows for greater diversification in the characteristics of these equities - as a consequence, for example, a portfolio targeting the lower half of market capitalizations and upper third of book-to-market equity may have higher exposure to the value factor than a portfolio targeting the upper half of market capitalizations and upper third of book-to-market equity (differences in performance may be attributed to this difference in exposure (rather than the difference in size)).

Yearly returns for bivariate portfolios of size and value from the FF5 in emerging markets:
Yearly returns for bivariate portfolios of size and profitability from the FF5 in emerging markets:
Yearly returns for bivariate portfolios of size and investment from the FF5 in emerging markets:
Yearly returns for bivariate portfolios of value and profitability from the FF5 in emerging markets:
Yearly returns for bivariate portfolios of value and investment from the FF5 in emerging markets:
Yearly returns for bivariate portfolios of profitability and investment from the FF5 in emerging markets:

Software Architecture Overview

The project was designed with an object-oriented approach using classes for each part. As the metadata class, the selection and reference of data is controlled through a CSV file specifying a title to assign for display, region for which the data is applicable, label to use as the variable name, URL from which to access the data, and indices of the relevant sections and columns within the data. With regard to these indices, they need to be manually assigned based on which sections should be extracted, as well as which columns should be extracted within the section, while labels are also required and form part of the data frames. Specifically, the ascribed columns include Title, Region, Label, URL, SetsIndex, SetsTitle, SetsLabel, ColumnsName, and ColumnsIndex (although purposefully designed to work with the online library provided by Kenneth French, the subsequent operations would work with any file of the same structure).

Example of a CSV detailing the information to generate an object using the metadata class:
				Title, Region, Label, URL, SetsIndex, SetsTitle, SetsLabel, ColumnsName, ColumnsIndex
				Portfolios Size, United States, size_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Lower 30%, Middle 40%, Upper 30%", "0, 2, 3, 4"
				Portfolios Value, United States, value_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Lower 30%, Middle 40%, Upper 30%", "0, 2, 3, 4"
				Portfolios Profitability, United States, profitability_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_OP_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Lower 30%, Middle 40%, Upper 30%", "0, 1, 2, 3"
				Portfolios Investment, United States, investment_us, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Portfolios_Formed_on_INV_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Lower 30%, Middle 40%, Upper 30%", "0, 1, 2, 3"
				Portfolios Size-Value-Profitability, Developed Markets, size_value_profitability_dv, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/32_Portfolios_ME_BEME_OP_2x4x4_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM LoOP, Small LoBM HiOP, Small HiBM LoOP, Small HiBM HiOP, Large LoBM LoOP, Large LoBM HiOP, Large HiBM LoOP, Large HiBM HiOP", "0, 1, 4, 13, 16, 17, 20, 29, 32"
				Portfolios Size-Value-Investment, Developed Markets, size_value_investment_dv, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Developed_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM LoINV, Small LoBM HiINV, Small HiBM LoINV, Small HiBM HiINV, Large LoBM LoINV, Large LoBM HiINV, Large HiBM LoINV, Large HiBM HiINV", "0, 1, 4, 13, 16, 17, 20, 29, 32"
				Portfolios Size-Profitability-Investment, Developed Markets, size_profitability_investment_dv, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Developed_32_Portfolios_ME_INV(TA)_OP_2x4x4_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoOP LoINV, Small LoOP HiINV, Small HiOP LoINV, Small HiOP HiINV, Large LoOP LoINV, Large LoOP HiINV, Large HiOP LoINV, Large HiOP HiINV", "0, 1, 4, 13, 16, 17, 20, 29, 32"
				Portfolios Size-Value, Emerging Markets, size_value_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_6_Portfolios_ME_BE-ME_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoBM, Small HiBM, Large LoBM, Large HiBM", "0, 1, 3, 4, 6"
				Portfolios Size-Profitability, Emerging Markets, size_profitability_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_6_Portfolios_ME_OP_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoOP, Small HiOP, Large LoOP, Large HiOP", "0, 1, 3, 4, 6"
				Portfolios Size-Investment, Emerging Markets, size_investment_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_6_Portfolios_ME_INV_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, Small LoINV, Small HiINV, Large LoINV, Large HiINV", "0, 1, 3, 4, 6"
				Portfolios Value-Profitability, Emerging Markets, value_profitability_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_4_Portfolios_BE-ME_OP_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, LoBM LoOP, LoBM HiOP, HiBM LoOP, HiBM HiOP", "0, 1, 2, 3, 4"
				Portfolios Value-Investment, Emerging Markets, value_investment_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_4_Portfolios_OP_INV_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, LoBM LoINV, LoBM HiINV, HiBM LoINV, HiBM HiINV", "0, 1, 2, 3, 4"
				Portfolios Profitability-Investment, Emerging Markets, profitability_investment_em, https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Emerging_Markets_4_Portfolios_BE-ME_INV_CSV.zip, 2, Value-Weight Year, data_portfolio_year, "Date, LoOP LoINV, LoOP HiINV, HiOP LoINV, HiOP HiINV", "0, 1, 2, 3, 4"

The metadata class will separate each row in the CSV file as a source to be used. For each source, an object is created through the source class to handle the retrieval and extraction of the target data. An objected created as a source class will consist of identification properties from the associated metadata class, as well as the raw and processed data as dataframes. The raw data is simply the retrieved data formatted numerically with NaN values occupying incompatible records (downloaded version of the original data is also optionally stored). This results in each section being continuous sets of numeric values with groups of NaN values between them and allows for the simple identification of sections based on the alignment of these values. So, the processed data extracts the relevant sections and labels them based on the variable names from the metadata. The relevant columns for the corresponding sets are then simply selected in each of the extracted sets (in all cases, it is necessary to select the column for dates, which is formatted as datetime values).

Illustration of the original, raw, and processed data and transformations in each step:

For analysis, several plots are available depending on the type of the source. For the premiums from individual factors, it is possible to visualize the history of realized returns since inception with moving averages for various lengths of time; distribution of realized returns as a histogram and kernel density estimation with identification of several metrics (such as mean, median, maximum, minimum, standard deviation, skewness, and kurtosis); and association internally between the factors with the Pearson, Kendall, and Spearman correlation coefficients and scatter plots showing a linear regression model. For the portfolios constructed based on factors, it is possible to visualize the history of realized returns since inception with moving averages for various lengths of time; distribution of realized returns as a histogram and kernel density estimation with identification of several metrics (such as mean, median, maximum, minimum, standard deviation, skewness, and kurtosis); cumulative realized returns with vintages beginning from each point in time and progressing through time on a linear or log scale with comparison against fixed returns and identification of several metrics (such as average returns for various lengths of time, time until a cumulative return of 10 times, and time for drawdowns until a positive return); and annualized returns from the compound annual growth rate from each point in time and progressing through time with a heatmap to illustrate the impacts of events and identification of several metrics (such as dispersion of results for various lengths of time - although the results appear to converge, a small compounding difference can have drastically divergent outcomes over long periods of time, as seen with the variation of cumulative realized returns).

Finally, for organization, a collection class has been created to hold and manage the sources as properties. This allows for designated analysis based on the types of the sources (such as yearly data compared to monthly data or individual factor premiums compared to portfolio constructions), as well as saving the dataframes of the sources as separate sheets in an XLSX file - alternatively, it is possible to easily save the entire collection class as a PKL file through the Pickle module.

Latest public version available online through the Git repository hosted on GitLab:
X